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Abstract 

We establish sufficient conditions on durations that are stationary with finite variance and memory 
parameter d € [0, 1/2) to ensure that the corresponding counting process N(t) satisfies Var N(t) ~ 
Ct 2d+1 (C > 0) as t — » oo, with the same memory parameter d £ [0, 1/2) that was assumed for the 
durations. Thus, these conditions ensure that the memory in durations propagates to the same mem- 
ory parameter in counts and therefore in realized volatility. We then show that any Autoregressive 
Conditional Duration ACD(1,1) model with a sufficient number of finite moments yields short mem- 
ory in counts, while any Long Memory Stochastic Duration model with d > and all finite moments 
yields long memory in counts, with the same d. Finally, we present a result implying that the only 
way for a series of counts aggregated over a long time period to have nontrivial autocorrelation is 
for the short-term counts to have long memory. In other words, aggregation ultimately destroys all 
autocorrelation in counts, if and only if the counts have short memory. 
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I Introduction 



There is a growing literature on long memory in volatility of financial time series. See, e.g., Robinson 
(1991), Bollerslev and Mikkelsen (1996), Robinson and Henry (1999), Deo and Hurvich (2001), Hurvich, 
Moulines and Soulier (2005). Long memory in volatility, which has been repeatedly found in the empirical 
literature, plays a key role in the forecasting of realized volatility (Andersen, Bollerslev, Diebold and 
Labys 2001, Deo, Hurvich and Lu 2005), and has important implications on option pricing (see Comte 
and Renault 1998). 

Given the increasing availability of transaction-level data it is of interest to explain phenomena ob- 
served at longer time scales from equally-spaced returns in terms of more fundamental properties at the 
transaction level. Englc and Russell (1998) proposed the Autoregressive Conditional Duration (ACD) 
model to describe the durations between trades, and briefly explored the implications of this model on 
volatility of returns in discrete time, though they did not determine the persistence of this volatility, as 
measured, say, by the decay rate of the autocorrelations of the squared returns. Deo, Hsich and Hurvich 
(2005) proposed the Long-Memory Stochastic Duration (LMSD) model, and began an empirical and the- 
oretical exploration of the question as to which properties of durations lead to long memory in volatility, 
though the theoretical results presented there were not definitive. 

The collection of time points • • • t_i < t < < ii < t 2 < ■ ■ ■ at which a transaction (say, a trade of a 
particular stock on a specific market) takes place, comprises a point process, a fact which was exploited 
by Engle and Russell (1988). These event times {tk} determine a counting process, 

N(t) = Number of Events in {0,t\. 

For any fixed time spacing At > 0, one can define the counts AN t i — N(t'At) — N((t' — I) At), the 
number of events in the i"th time interval of width At, where t' = 1,2, ■ ■ ■ . The event times {tk} k x L_ 00 
also determine the durations, given by {Tfe}^L_ oc , Tk = tk — tk-i- 

Both the ACD and LMSD models imply that the doubly infinite sequence of durations {Tk}'%L_ 00 are 
a stationary time series, i.e., there exists a probability measure P° under which the joint distribution of 
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any subcollection of the {t^} depends only on the lags between the entries. On the other hand, a point 
process N on the real line is stationary under the measure P if P(N(A)) = P(N(A + c)) for all real c. 
A fundamental fact about point processes is that in general (a notable exception is the Poisson process) 
there is no single measure under which both the point process N and the durations {rfc} are stationary, 
i.e., in general P and P° are not the same. Nevertheless, there is a one-to-one correspondence between 
the class of measures P° that determine a stationary duration sequence and the class of measures P that 
determine a stationary point process. The measure P° corresponding to P is called the Palm distribution. 
The counts are stationary under P, while the durations are stationary under P°. 

Deo, Hsieh and Hurvich (2005) pointed out, using a theorem of Daley, Rolski and Vesilo (2000) that if 
durations are generated by an ACD model and if the durations have tail index k £ (1, 2) under P°, then 
the resulting counting process N(t) has long range count dependence with memory parameter d > I — k/2, 
in the sense that VariV(t) ~ Cn 1+2d (C > 0) as t — > oo, under P. This, together with the model for 
returns at equally spaced intervals of time given in Deo, Hsieh and Hurvich (2005) implies that realized 
volatility has long memory in the sense that the n-term partial sum of realized volatility has a variance 
that scales as C2n 2d+1 as n — > oo, where C2 > 0. Deo, Hsieh and Hurvich (2005) also showed that if 
durations are generated by an LMSD model with memory parameter d under P° then counts have long 
memory with memory parameter d counts > d, but unfortunately this conclusion was established only 
under the duration-stationary measure P°, and not under the count-stationary measure P. This gap can 
be bridged using methods described in this paper. Still, the results we have described above merely give 
lower bounds for the memory parameter in counts. 

In this paper, we will establish sufficient conditions on durations that are stationary with finite variance 
and memory parameter d £ [0,1/2) under P° to ensure that the corresponding counting process N(t) 
satisfies Var N(t) ~ Ct 2d+1 (C > 0) as t — > 00 under P, with the same memory parameter d £ [0,1/2) that 
was assumed for the durations. Thus, these conditions ensure that the memory in durations propagates 
to the same memory parameter in counts and therefore in realized volatility. 

Next, we will verify that the sufficient conditions of our Theorem ^ are satisfied for the ACD(1,1) 
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model assuming finite 8 + S moment (6 > 0) of the durations under P°, and for the LMSD model with any 
d G [0,1/2) assuming that the multiplying shocks have all moments finite. Thus, any ACD(1,1) model 
with a sufficient number of finite moments yields short memory in counts, while any LMSD model with 
d > and all finite moments yields long memory in counts. These results for the LMSD and ACD(1,1) 
models are given in Theorems and respectively. Lemma Q which is used in proving Theorem [21 
provides a Rosenthal-type inequality for moments of absolute standardized partial sums of durations 
under the LMSD model, and is of interest in its own right. 

Finally, we present a result (Theorem 0} implying that if counts have memory parameter d £ [0, 1/2) 
then further aggregations of these counts to longer time intervals will have a lag-1 autocorrelation that 
tends to 2 2d — 1 as the level of aggregation grows. Interestingly, this limit is zero if and only if d = 0. 
Thus, one of the important functions of long memory in counts is that it allows the counts to have a 
non- vanishing autocorrelation even as At grows, as was found by Deo, Hsieh and Hurvich (2005) to occur 
in empirical data. By contrast, short memory in counts implies that counts at long time scales (large At) 
are essentially uncorrelated, in contradiction to what is seen in actual data. To summarize, aggregation 
ultimately destroys all autocorrelation in counts, if and only if the counts have short memory. 

II Theorems on the propagation of the memory parameter 

Let E, E°, Var, Var° denote expectations and variances under P and P°, respectively. Define fx = E (rk) 
and A = — . Our main theorem uses the assumption that P° is {rfc}-mixing, defined as follows. Let 
A/" = ff({7*}£L_oo) and T n = a{{r k }^ n ). We say that P° is {^-mixing if 

lim sup \P°(AnB) - P°(A)P°(B)\ =0 

for all AeN. 

Theorem 1 Let {r^} be a duration process such that the following conditions hold: 
i) {rfe} is stationary under P . 
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ii) P 



is {Tk} -mixing. 



Hi) 3 d G [0, i) such that 




s G [0, 1] 



l/2+d 



converges weakly to <X-Bi/2+d(') under P , where a > and i?i/2+d(-) is fractional Brownian motion if 



Then the induced counting process N(t) satisfies VarN(t) ~ Ct 2d+1 under P as t -» oo where C > 0. 

Remark: Inspection of the proof of Theorem ^ reveals that if d > 0, only 4/(0.5 — d) + 5 finite moments 
are needed, where S > is arbitrarily small. The closer d is to 1/2, the larger the number of finite 
moments required. 

Remark: As pointed out by Nieuwenhuis (1989), if {t^} is strong mixing under P° then P° is {rfe}- 
mixing. This weaker form of mixing is essential for our purposes since even Gaussian long-memory 
processes are not strong mixing. See Guegan and Ladoucette (2001). 



< d < \ or standard Brownian motion B 1 / 2 = B if d = 0. 



iv) 




< oo 




A LMSD Process 



Define the LMSD process {tr}^ 



!= — OO 



for d G [0, \) as 



T k = e hk e k 



where under P° the efc > are i.i.d. with all moments finite, and hk — Yl^-p bi e k—i> the {e^} are i.i.d. 
Gaussian with zero mean, independent of {e/c}, and 




CV, \a\ < 1 




if d = 
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(C 7^ 0) as j — > oo. Note that for convenience, we nest the short-memory case (d = 0) within the LMSD 
model, so that the allowable values for d in this model are < d < 1/2. 

Theorem 2 // the durations {t^} are generated by the LMSD process with d £ [0, 1/2), then the induced 
counting process N(t) satisfies VarN(t) ~ Ct 2d+1 under P as t — > oo where C > 0. 

To establish Theorem |2J we will use the following Rosenthal- type inequality. 

Lemma 1 For durations {t^} generated by the LMSD process with d £ [0, \), for any fixed positive 
integer p > 2, E°{\y n — E°(y n )\ p } is bounded uniformly in n, where 

En 
- n i/2+d 

B ACD(1,1) Process 

Define the ACD(1,1) process {Tk} k x = _ 00 as 

1p k = U + CtTk-1 + P^k-l 

with w > 0, a > 0, [3 > and a + [3 < 1, where under P°, > are i.i.d. with mean 1. We will assume 
further that under P°, ej, has a density g e such that J g e (x)dx > 0,V > and _E°(r® +<5 ) < oo for some 
S > 0. 

Nelson (1990) guarantees the existence of the doubly-infinite ACD(1,1) process {i~k}kL-oo' which in 
our terminology is stationary under P , 

Theorem 3 Suppose that the durations {t^} are generated by the ACD(1,1) model, with the additional 
assumptions stated above. Then the induced counting process N(t) satisfies VarN(t) ~ Ct under P as 
t — > oo where C > 0. 
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Ill Autocorrelation of Aggregated Counts 



Theorem 4 Let {X t } be a stationary process such that Var(Y^—-y X t ) ~ Cn 1+2d as n — > oo, where C ^ 
and d G [0,1/2). Then 



lim Corr 

n — >oo 



E**> E * 

,t=l t=n+l 



= 2 2d - 1. 



Proof: 



Thus, 



Var 



2)1 



= 2 Var 



5> 



+ 2Cov 



2/1. 



E*» E * 

t=l t=n+l 



Cov 



E^. E * 

t=l t=n+l 



.5 Var 



E^ 



2Var 



E^ 



The result follows by noting that linin^oo n 1 Var(^™ =1 Xt) — C, where 0. □ 



This theorem has an interesting practical interpretation. If we write Xk = N[kAt] — N[(k — l)At] 
where At > is fixed, then Xk represents the number of events (count) in a time interval of width At, 
e.g. one minute. Thus, X)fc=i Xk is the number of events in a time interval of length n minutes, e.g. one 
day. The theorem implies that as the level of aggregation (n) increases, the lag-1 autocorrelation of the 
aggregated counts will approach a nonzero constant if and only if the non- aggregated count series {V^} 
has long memory. In other words, the only way for a series of counts over a long time period to have 
nontrivial autocorrelation is for the short-term counts to have long memory. Since in practice long-term 
counts do have substantial autocorrelation (see Deo, Hsieh and Hurvich 2005), it is important to use 
only the models for durations that imply long memory in the counting process (LRcD). Examples of such 
models include the LMSD model (see Theorem and ACD models with infinite variance (see Daley, 
Rolski and Vesilo (2000), and Theorem 2 of Deo, Hsieh and Hurvich, 2005). 
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IV Appendix: Proofs 



Let P denote the stationary distribution of the point process N on the real line, and let P° denote the 
corresponding Palm distribution. P determines and is completely determined by the stationary distri- 
bution P° of the doubly infinite sequence of durations. Note that the counting process N 
is stationary under P, the durations are stationary under Pq, but in general there is no single distribu- 
tion under which both the counting process and the durations are stationary. For more details on the 
correspondence between P and P°, see Daley and Vere- Jones (2003), Baccelli and Bremaud (2003), or 
Nieuwenhuis (1989). 

Following the standard notation for point processes on the real line (see, e.g., Nieuwenhuis 1989, p. 
594), we assume that the event times {tfc}^.^ satisfy 

• • • < t-i < t < < ii < t 2 < 

Let 

ft! ifk = l 

[rfe ifk>2 ■ 

Here, the random variable t\ > is the time of occurrence of the first event following t = 0. For t > 0, 
define the count on the interval (0, t], N(t) := N(0,t], by 

s 

N(t) = max{s : Ui < £}, u\ < t 

i=l 

= 0, ui > t. 

Throughout the paper, the symbol => denotes weak convergence in the space D[0, 1]. 
Proof of Theorem ^ 

By assumption Hi), Y n => dB^+d under P , where a > 0. First, we will apply Theorem 6.3 of 
Nieuwenhuis (1989) to the durations {Tk} k x L_ QO to conclude that Y n => o-B 1 / 2 +d under P. Since the 
{ r *i}fc^-oo are stationary under P° and are generated by the shift to the first event following time zero 
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(see Nieuwenhuis 1989, p. 600), and since we have assumed that P° is {r^j-mixing, his Theorem 6.3 
applies. It follows that Y n =4> (jBi/ 2 + d under P. We next show that the suitably normalized counting 
process converges to the same limit under P. 



Define 



Note that for all s, Y n (s) = Y n (s) + n-( l / 2+d \ui - n). From Baccelli and Bremaud (2003, Equation 
1.4.2, page 33), for any measurable function h, 

E[h(T 1 )] = \E [T 1 h(n)} . (l) 

Since u\ < n, and since assumption if) implies that ti has finite variance under P , using /i(x) = x in 
P, it follows that ra-^+^iii - n ) is o p (l) under P. Thus, Y n =>■ crBi/ 2 +d under P. 

Let 

By Iglehart and Whitt (1971, Theorem 1), it follows that Z(t) 4 CB 1/2+d (l) under P as t -> oo, 
where C > 0. Furthermore, by LemmaEl Z 2 {i) is uniformly intcgrable under P and hence lim t Var[i7(t)] = 
C 2 Var[P 1 /2+d(l)]- The theorem is proved. □ 

Proof of Theorem |2J 

We simply verify that the conditions of Theorem U hold for this process. 

By definition {r^} is stationary under P° and by Lemma U P° is {t^} mixing. By Surgailis and 

p 



Viano (2002), Y n crB 1 / 2+d under P°, where a > and by Lemma^ sup^-E 
all p. Thus, the result is proved. □ 

Proof of Theorem |3| 

We simply verify that the conditions of Theorem^hold for this process. 



£L'=i( r fc-M) 

n 1 / 2 + d 



< oo for 
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By Lemma0J {rfe} is exponential a-mixing, and hence strong mixing and thus by Nieuwenhuis (1989), 
P° is {rfc}-mixing. Furthermore, since all moments of Tfe exist up to order 8 + 8,8 > 0, we can apply 
results from Doukhan (1994) to obtain 

Y n CB, (3) 

if ^varQ^ =1 T k) -> C 2 > 0, as n -> oo. 

It is well known that the GARCH(1,1) model can be represented as an ARMA(1,1) model, see Tsay 
(2002). Similarly, the ACD(1,1) model can also be re-formulated as an ARMA(1,1) model, 

T k = uj+ (a + /5)r fe _i + (T) k -Pr}k-i) (4) 

where rjk = Tk—ipk is white noise with finite variance since E(t^ +s ) < oo. The autoregressive and moving 
average parameters of the resulting ARMA(1,1) model are (a + (3) and f3, respectively. 

It is also known that for any stationary invertible ARMA model {zfc}, nvar(^) — > 27r/ z (0), where 
/ z (0) is the spectral density of {zk] at zero frequency. For an ARMA(1,1) process, / z (0) > if the 
moving average coefficient is less than 1. Here, since < < 1, we obtain ^-var(^^ =1 Tk) = nvar(? ) — > 
27r/ T (0) > 0, as n — > oo. Therefore J2J follows. 

Define y n — Sfe=i r fc- Since all moments of Tk are bounded up to order 8 + <5, (<5 > 0) under P°, 
by Yokoyama (1980), we obtain 

E°{\y n ~ E°(y n )\ 8+5 } < K <oo, 5>0 (5) 

uniformly in n, provided that {r^} is exponential a-mixing, which is proved in Lemma 

Therefore, we can apply Theorem^to the ACD(1,1) model and the result follows. □ 

Proof of Lemma ^ 

We present the proof for the case < d < \. The proof for the case d = follows along similar lines. 
Also, we assume here that p is a positive even integer. The result for all positive odd integers follows by 
Holder's inequality. 
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Let y n — y n — E°(y n ). Since p > 2 is even and E°(y n ) p can be expressed as a linear combination of 
the products of the joint cumulants of y n of order 2, . . . ,p, we have 



0<E°\y n \P = E°(yP) = ]T [c* J] cumfl^) 

j terms 

< n i cum (£ 




jeir 

where ir ranges over the additive partitions of n and is a finite constant depending on ir. 

Since the first order cumulant of y n is zero and for all integers m > 2, the m-th order cumulant of 
y n is equal to that of y n , it suffices to show that the absolute value of the m-th order cumulant of y n is 
bounded uniformly in n under P°, for all m e {2, . . . 

We first consider the second and the third order cumulants. 

For the second order cumulant (m = 2), 

|cum(2/„,y„)| = |cum( ~77t~ ; < "^TT l cum ( Tfc ' T ")l 

fc=l s=l 

To calculate the joint cumulant cum(Tj.,T s ), we briefly introduce some terminology, mainly cited from 
Brillinger (1981): consider a (not necessary rectangular) two-way table of indices, 

(1,1) ... (l,Jx) 

(7,1) ... (J, J,) 

and a partition Pi U P2 U . . . U Pm of its entries. We say sets P m > , P m » of the partition hook if there 
exist G P m ' and (i 2 , .72) 6 P m « such that ii = « 2 , i.e. at least one entry of P m > and one entry 

of P m n come from the same row in the two-way table. We say that sets P m i and P m " communicate 
if there exists a sequence of sets P mi = P m >,P m2 , . . . , P mN = P m » such that P mn and P mn+1 hook for 
n = 1, . . . , N — 1. So P m < and P TO " communicate as long as one can find an ordered sequence of sets such 
that all neighboring pairs hook, and this sequence links P m i and P m » together. Finally a partition is said 
to be indecomposable if all sets in the partition communicate. 
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By Brillingcr (1981), Theorem 2.3.2, for a two-way array of random variables Xij, j = 1, . . . , J%, 

i = 1, . . . , / (see the corresponding two-way table above), the joint cumulant of the I row products 

■h 

Y i = l[X ij , i = l,..., I 

3 = 1 

is given by, 

cum(Yi, . . . , Y T ) = ^ cum(X i:) ■; ij G v x ) . . . cum(Xij;ij G v w ) 
where the summation is over all indecomposable partition v = v\ U. . .Uv m of the two-way table of indices. 

It is more convenient to write the partitions in terms of symbols representing the random variables, 
instead of the indices themselves. We will always use distinct symbols, so that there is a one-to-one 
correspondence between the indices and the symbols. Nevertheless, the random variables represented by 
distinct symbols need not be distinct. For example, e hk and e hs are distinct symbols, but if k = s, they 
are not different random variables. Ultimately, the cumulants are computed from the random variables. 

To compute cum(rfe, t s ), we use the two-way table of indices (left) and the corresponding table of 
symbols (right), 

(1,1) (1,2) e h » e k 

(2,1) (2,2) , e h - e s 

with I = 2, Ji = 2 and J 2 = 2. 

From Brillingcr (1981), Theorem 2.3.1, all joint cumulants corresponding to partitions with at least 
one of the symbols representing {e hk } and at least one of the symbols representing {ek} in the same set, 
are zero because the corresponding random variable sequences are mutually independent. So for to = 2, 
excluding those with at least one of e hk ,e hs and at least one of £fe, e s in the same set, the only possible 
indecomposable partitions (here, the partition is given in terms of the symbols) are: 

{e fc *,e fc -},{e fc ,e a } 
{eSeH{e fc },{e a } 
{e h "},{e h '},{€ k ,e a } 
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Thus, |cum(y n , y n )\ < A + B + C, where, 

- n n 

^ = ^mEEi cuni ( e ' lfc ' e ' ls )ii cum ( e ^^)i 

fc=l S=l 

^ n n 

S = "wEE |cum(e /lfc )cum(e' Is )l|cum(e fc ,e s ) 

fe=l s=l 
_^ n n 

C = "WiEE |cum(e^e^||cum(e fc )||cum(e s ) 



fe=l s=l 

Both A and -B reduce to a single summation because of the serial independence of the {e/c}, so 
A = 0{n- 2d ) and B = 0{n- 2d ). For C, by Surgailis and Viano (2002), Corollary 5.3, 

|cum(e' lfc ,e' ls )l = e CT ' |e r i»-i - 1| 
where r| fc _ s | = cov(/ife, and er^ = var(/ifc). 

By the assumption on {bj} in the Theorem|2 it follows that r s ~ , as s — » oo, where A" > 0, 

so that 

n n n n 

EEl^'- 1 ! ^ 2^^|e^i-l| + nK"-l| 

fe=l s=l s>k 

n 

< KnJ^f 1 ' 1 +n\e r " -l\ = 0{n 2d+1 ). 

3=1 

Thus term C is 0(1). Hence, \cum(y n ,y n )\ is O(l). 
Next, for the third order cumulant (m = 3), we have 

^ n n n ^ n n n 

\cum(y n ,y n ,y n )\ = 3ti+ 3 lEEE cum ^' Ts ' T ")l - 1^1 E E E \ m M ehk e k , e K e s , e h "e u )\ 



" fc=l s=l u=l n k=l s=l u=l 



We will use the following two-way table: 
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For convenience, we group the indecomposable partitions according to how many sets (L = 1,2,3) 
the symbols e hk , e hs , e' 1 " are partitioned into. 

We have three groups of indecomposable partitions, excluding those with at least one of e hk , e hs , e hu 
and at least one of e k , e a , e u in the same set: 

i) Group 1 

{e h *,e h °,e h "},{e k ,e s ,e u } 
{e h *,e h °,e h "},{e k ,e s },{e u } 
{e h \e h °,e h "},{e k ,e u },{e s } 
{e h *,e h %e h «},{e k },{e s ,e u } 
{e h *,e h °,e h «},{e k },{e s },{e u } 

ii) Group 2 

{e h *,e h °},{e h "},{e k ,e s ,e u } 
{e h *,e h °},{e h «},{e k },{e s ,e u } 
{e h *,e fc '},{e h «},{e a },{e fc ,e 11 } 

{e h *,e h -},{e fc '},{e fc ,e a ,e u } 
{e h *,e h »},{e h '},{e fc },{e a ,e u } 
{e' ifc ,e^},{ e ^},{e s },{ efc , eu } 

{e h -,e h '},{e h *},{e fc ,e a ,e u } 

{e h -,e h '},{e h *},{cfc},{c,c u } 
{e h ",e h °},{e h *},{e s },{e k ,e u } 

iii) Group 3 

{e h *},{e h '},{e fc -},{e fc ,e a ,e u }. 
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We next study the order of the dominant contribution to |cum(y„, y n , y n )\ corresponding to each 
group. 

In Group 1, the dominant term arises from the last partition since it yields a triple summation, 

n n n 3 n n n 

E E E i cum ( e ^ ^ e " u ) 1 1°^^^ 

n 2 fe=ls=lu=l 71 2 fc=ls=l«=l 

where fj, e — E°(ci). 

By Surgailis and Viano (2002), Corollary 5.3, 

n n n 



k=l 8=1 u=l 
n n n 

< 5ZEE e * <T *l e ' > ~° 1 - l||e I >-i - l||e r l— l - 1| 

fc— 1 S— 1 U=l 

n n n n n n 

3 „2 . „ , , „ , \ — ^\ — — ^ 3^.2 



53535Zet^|e r i*-i - l|| e r i*-«i - 1| + EE E ei<7 *l er "" a ' " 1 l|e r| - u| - M 

k=l 8=1 U— 1 fc— 1 S— 1 U— 1 

n n n 

EEE eiCT 'i er|fc ^ l - 1 ii er|s ^ l - 1 i 



A'=l 6 = 1 U—l 



The last three summations are actually the same due to symmetry: we can simply relabel the indices 

\k- 



in the last summation by s <-> u. As for the first summation, since |rijt_ tt i | = |cov(/i^,, h u )\ < a\ — v&r(hk), 



we have |e r i fe -"i - 1| < (e CT " + 1) < oo. So 

n n n n n n 

^^^|cum(e^, e ^ )e ^)| < if£££eM| e n>-.i_i|| e '-|.-«i_l| 

fc— 1 S— 1 fi — 1 

n n n 

+ 3^^^ei^|e^!-l||e^-«l-l| 

fc— 1 5—1 li — 1 

n n n 

< ^EEE l er ' fc_s| ~ l \\ er] °~ n] - !| (for some if > 0) 



fc = l S=l -fi=l fc = l S=l ?A=1 

n n n 



fc=l s=l u=l 
4d+l\ 



= <3(n 4!l+i ) 

The last step follows from Lemma |3J So ^'j J2k=i S"=i Y0u=i \cum(e hk , e hs , e hu )\ converges to 

71 2 

zero because (4d + 1) < (3d + f ). 
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Similarly, the dominant contribution from Group 2 is of order 



1 



T £5>um(e\e h 0l|cum(e^)l 



3d+§ 



* j 



Note that in Group 2, all three of e hk ,e hs ,e hu are partitioned into two sets. Therefore, partitions with 
all three of efc,e s ,e u in different sets are not indecomposable, so the dominant contribution is a double 
sum, 



where fi e h = E°(e hl ). 

So the dominant term in Group 2 also converges to zero. 

For Group 3, all three of e hk , e h % e hu are partitioned into three different sets, so that the part of the 
partition involving e^, e s , e u must be {e^, e s , e u } in order to be indecomposable. The resulting summation 
now is only a single one of order O^n 1 ). The dominant contribution again converges to zero. 

Notice that the order of the dominant contribution from group 3 (0(n~ 3d ~^)) is of smaller order than 
that from group 2 {0{n~ d ~~ s )), which is of smaller order of that from group 1 (0(n d -2)). This will be 
shown to hold in general for any m-th order joint cumulant. 

Next, we prove that the m-th order joint cumulant, which satisfies 



converges to zero for all m > 2. 

The indecomposable partitions of (e 'le^, . . . ,e m ^k m ) are organized in a similar manner as before 
into m groups, where in Group L the symbols e fc i , . . . , e hh ™ are divided into L sets (L — 1, . . . , m). 

a) First, consider Group 1. The dominant contribution to the righthand side of 10 corresponding 
to Group 1 must be the one from the partition in which all of the symbols e' ifc i , . . . , e hkm are in one 
set and each of the symbols , . . . , e^ m is in a set by itself. The resulting summation is an m-fold 




n 



-d-l 




m terms 



(6) 
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summation. By Corollary 5.3 of Surgailis and Viano (2002), the absolute value of the m-th joint cumulant, 
|cum(e' lfc i , . . . , e km )|, is bounded by a summation taken over all connected graphs with m vertices. Each 
entry of the summation is a product of terms of the form |e r|fci_fc j — 1| along the edges that connect 
vertices ki and kj of a connected m-vertex graph. 

For a graph with m vertices, we need at least (to — 1) edges to connect them. It is known (see 
Andrasfai, 1977, Chapter 2) that any connected m-vertex graph with (to — 1) edges may be represented 
as a tree. Let W^i,...,kj} < oo be the total number of trees with vertices labeled by ki, fej+i, . . . ,kj. 

If a connected m-vertex graph used in applying Corollary 5.3 of Surgailis and Viano (2002) has more 
than (m — 1) edges, it is not a tree, and there will be more than (m — 1) terms of the form |e r|fci ~ fc j — 1| 
being multiplied together in the m-fold summation in jnj. But, for all ki,kj, \r\ki-kj\l = |cov(/ifc i , /ife 3 )| < 
a\ = var(/ifc i ), so |e r|fci ~ fc j — 1| < (e ah + 1) < oo, and for any connected m-vertex graph with more than 
(to — 1) edges, there exists an m-vertex subgraph that has a tree representation. So we can retain a 
product of (to — 1) terms of the form e r|fci ~ fc j' — 1 1 in the m-fold summation in © and move remaining 
terms out of the summation, bounding each by (e <T '» + 1). The resulting product of (to — 1) terms of the 
form |e r|fci ~ fc 3' — 1| is itself a product over the edges of an m-vertex tree. 

In all, |cum(e' l ' ,: i , . . . , e hk ™ )| is bounded by a constant times a summation over the set Gtk lt ...,k m } °f 
all Wi^ k m } trees. Each entry of the summation is a product of terms of the form |e r|fci_fc 3 | — 1| being 
multiplied over the (m — 1) edges of the tree. Thus, we have 

n n n n 

|cum(e^,...,e^)| < £ \e r ^-l\}, (K > 0) 

fci=i k m =i k 1= i k m =x G {kl km) (fei,fcj)ea(G?{ fcl ,..., fcm }) 

= * E t-t{ II |e^'-l|} 

G {kl fe TO }fcl=l fem=l (fc;,fc 3 )6fi(G {fcl 

V v ' 

(m-i) terms 

where 0.(0^,...^}) is the set of edges of the graph indexed by G{ki,...,k m }- 

By Lemma |21 each entry of the summation over Gtki,...,k m } ^ s °^ or der 0(n 2dm ~ 2d+1 ). Also this 



1G 



summation is taken over a finite number of graphs 0V{fc! .,fc m } < oo), therefore 

n n 

••• E \cum{e hk i,...,e h *™)\=0(n 2dm - 2d+1 ). 
kt=i k m =i 

Because the normalization term in © is of order 0(n m ( d+ 2 ) ) , the dominant contribution to cum ( y n , . . . , y n 

» , — — ' 

terms 

from Group 1 converges to zero, for any to > 2. 

b) For Group 2, the symbols e *i , . . . , e *»> are partitioned into two sets. Thus, the partitions with 
each of the to symbols , . . . , £fc m in a set by itself are not indecomposable. Relabel the two sets as 
{e hg i , . . . , e hg " }, {e^ 9 ^ 1 , . . . , e hs!m }. Since the partition must be indecomposable, there must be one 
I G (1, . . . , q) and one J 6 (q + 1, . . . , to), such that gi = gj. The dominant contribution to © from 
Group 2 is therefore 

- n n 

n m(rf+i) E ■ ' ■ E |cum(e^,...,e /l ^)||cum(eH + i ; ... ; e' 1 -)l|cum(e gf ,e SJ )| (7) 

91=1 9m = l 

Similarly as above, after applying Corollary 5.3 of Surgailis and Viano (2002) and after bounding 
certain terms, we obtain 



n n 

E •• E l cum ( e ' i9 S ■ ■ ■ ,e' i9 '0||cum(e' l9 <!+ 1 , . . . , e'' 9m )l |cum(e g , , e 9J )| 

9i = l g m =l 
n n 

* *£•••£{ £ II i^'-n} 

91 = 1 g m =l G {91 9g) ( ffi , 9i )Gfi(G {gi 9(j} ) 

v v ' 

(9-1) terms 

•{ E IT \e r ^-l\}{\cum(e gi ,e gj )\} 

(m-q-i) terms 

n n 

= K E E E •■■ E 

• { rj | e n« 4 -.ii_i| rj | e H«-»ii_i|\ 

( ffj , 9 j-)£a(G {si gg} ) (s«.9i)en(G {99+1 ,... iSm} ) 

V v ' 

(m-2) terms, denote as r(gi,...,g m :G {gii ... igg} ,G {gg+1 ,..., gm} ) 
17 



As mentioned before, any graph G a in G{ gi ,...,g q } and any graph Gb in (j{ ffg+1) ... )5m } , can be represented 
by trees with q and (m — q) vertices, respectively. Since for any two trees, the resulting structure obtained 
by merging one vertex from each tree is again a tree, under the constraint gi = gj, there exists a graph G c 
in G{ sij ...,gj_i,g J+ i ,...,g m }i such that G c is obtained by merging G a and Gb together at the vertex gj = gj. 

Therefore, the numerical value of the term T evaluated for graphs G a and Gb and indices {gi, . . . , g m } 
with the constraint gj — gj (which follows from the independence of the {e gi }) is equal to the value of the 
term $ (defined below) evaluated using the graph G c in G{ gi ,,,,, gi _ 1 ,g I +1 ,...,g m } an d indices {<?i, . . . , gi-i, gi+i, ■ ■ ■ ,g m } 
without any constraint on the values of these indices. After re-parameterizing {g±, . . . , gi-i, gi+i, ■ • ■ , g m } 
by {h, . . . , l m -i}, we obtain 

n n 

\ cum ( ehai i • • • . ekaq ) I \cum(e h ^ , . . . , e h °™ ) | |cum(e SJ , e gj ) \ 

91=1 g m =l 

n n 

* K E E- E II \e^~l\ 

! m _i}'i =1 im-i=i (Wj)en(G {il ,..., !m _ l} ) 



(m-2) terms, denote as *(ii,...,i m -i:G {!li ... <(m _ 1 }) 

= 0{n 2dl - m -^ +1 ) 
where the final equality follows from Lemma [3J 

The above (m — l)-fold summation for Group 2 is of smaller order than the TO-fold summation from 
Group 1, which was 0(n 2d ' m_1 - )+1 ). Hence, the dominant contribution from Group 2 also converges to 
zero. 

c) In general, for Group L £ {1, . . . , m}, the symbols e fc i , . . . , e hk ™ are partitioned into L sets. Relabel 
the L sets as {e h ^ , . . . , e hg n }, {e hg n +1 , . . . , e hg ^ },..., {e 9<Ii - 1+1 , . . . , e h ° m }. Since the partition must be 
indecomposable, there must be L indices {I, J, ... , Z}, where / S (1, . . . , gi), J € (gi + 1, . . . , 52), • • • , Z £ 
(<Zl-i + 1, • ■ ■ , n%), such that gj = gj = . . . = gz- The dominant contribution to © from Group L is then, 

; « ' 

l terms 

1 n n 

E Jcum(e fe »i , ■ ■ ■ , e^i )| ■ ■ ■ , . . . , e h °™ )| |cum( e g , , e gj . . , e gz ) \. (8) 

91 = 1 Om-l . , V 

L-terms L terms 
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Similarly as before, we obtain 

n n 

V ... V |cam(e' , *s...,e' , ««)|...|cum(e fc '«*-i +1 ,... 1 e h »»»)| |cum(e 9J , e gj , . . . , e 



yz , 



gi = l g m = l 7* 

L-terms l terms 



n n 



- K E ■■■ E E ■ ■ • E 1 i9i = gj = ■■■ = gz} 



Gf91 ^ G <« to - 1+ i.-.«» > SI " 1 E terms 



L-fold 

| fj | e r i«-«ji _ i| ... Yl | e -i Si - Sj i _ !|| 

(Si,9;i)efi(G {ai ,..., 9gi }) fe.9i)6 n (G {S8£ _ i+1 ,..., Sm} ) 



(m-L) terms 

n n 

< if Yl E" E IT |e r ^'-l| 

'i=i U-i+i=i (U,i 3 )en(G {ll i m _ L+1 y) 

s , ' 

(m-L) terms 

= 0(n 2d( - m - L *> +1 ), 
by Lemma |2| 

The constraint gi = q j = . . . = qz allows the re-parameterization from {q\, . . . , g m \ to {h, . . . , l m -L+i} 

- v ' 

l terms 

and reduces the m-fold summation in © to an (m — L + l)-fold summation in the last inequality. It was 
shown for Group 2 that the graph obtained by merging one vertex from each of any pair of trees is again 
a tree. By induction, we obtain a tree by merging one vertex from each of L > 2 trees, which allows us 
to apply Lemma with M — m — L + 1 in the last step. 

So, the dominant contribution from Group L is 0{n 2d ( m ~ L ^ +1 ~ m ( d+ ^), (L = 1, . . . , m). Since d > 0, 
the dominant contribution from all groups occurs for L = 1. Finally, the dominant contribution from 
Group 1 is 0(n 2d< - m ^ 1)+1 ^ ml - d+ ^ ) ) , which tends to zero for m > 2 since d < \. □ 

Lemma 2 For durations {t^} satisfying the assumptions of Theorem^ 

limsup^[Z 4 (t)] < oo 
t 

where Z(t) is defined by Equation 0). 
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Proof: By Chung (1974, Theorem 3.2.1, page 42), E[Z(tf] < 1 + J^T=i P[Z\t) > a]. Thus, it 
suffices to show that 

OO 

limsup^P[Z 4 (f) > s] < oo. (9) 

* s=l 

Note that for any real k, 

Lfej 

N(t) > k u * ^ L ( 10 ) 

i=l 

We have 

P[Z 4 (i) > s] = P[Z(t) < -s 1 / 4 ] + P[Z(t) > s 1/4 ]. (11) 
Consider the second term P[Z(t) > s 1 / 4 ]. Using (|ll)p. we obtain 

P[^W>s 1/4 ] = P[JV(t) > - + s 1 ^] 

M 

L<?(M)J 



i=l 



where ff (t, s) = i + sW/a+d. 



So, 



Ls(M)J 

P[Z(t) > s 1/4 ] = P( ^ «i < 



\ [g{t,a)\V*+* ~ [g(t,a)\y^ 



Denote 

r/ _ Eli ( i M)J (^-M) 

L.9(M)J 1/2+d 
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Since — [x\ < —x + 1 for x > 0, we obtain for any positive p, 

, - MS V^/^ + m 

- ^r- L<?(M)JW 

< pfa> ^ 1/4 * 1/2+rf -^ 

(1 4. gl/4 i l/2+rf\(l/2+d)p 



M p[ s l/4 < l/2+d_ 1 ]p 



For i > 4, since s 1 /4 i i/2+rf _ x > an d I + rf < 1, we obtain 

P[^(<)> S 1/4 ]<^(|f/n^^y 



Now, consider 

P[Z(t) < -s 1/4 ] = P[JV(t) < - - a V* 4 i/2+«i] . 

A* 

Let o(t) = and u(i, s) = i - s 1 / 4 t 1 / 2+d . Using (TJJJ, we have 

L«(M)J 

P[Z(t) < -s 1/4 ] = P( Ui > t), s < a(t) 

i=i 

= P(tti > t), s = a(t) 

= 0, s > a(t) 

For s < a(t), we have v(t, s) > 0. Let 

_ E L : (M)J (Ut _ M) 

Lv(t,s)jV2+d 

Then for any positive p, 

Ms l/4 t l/2+d 



P w> 



(1 - s l/4£l/2+d)l/2+d. 



(1 _ s l/4 t l/2+d\(l/2+<i)p 

* *a(|WT) ~ 
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P[Z(t)<- S ^]<KE(\Wn^ ■ (13) 

For s = ait), P[Z(t) < -s 1 / 4 ] = P[ui > t] < SiSll. 
For s > ait), P[Z(t) < -s 1 / 4 ] = 0. 

Select any positive p such that ?(| — d) > 1 and thus ? > 1 since < h — d < 1. If it can be 
shown that sup t>1 s>1 i?(|C/| p ) < oo and sup t>1 s>1 Ei\W\ p ) < oo, then by (fT^|l and (JT3J, it follows that 
P[Z 4 (<) > s] is summable, uniformly in t. Thus, © follows and the proof is complete. 

We next show that indeed sup t>1 s>1 Ei\U\ p ) < oo and sup t>1 s>1 E(\ W\ p ) < oo for all positive p 
when d £ (0, |) and for p = 8 + 6, 6 > when d = 0. Define 

»i - fj n El£ ( 2 M)J (n-M) 

1 L.9(t,^)J 1/2+<i ' 2 L5(i,s)J 1/2+d ' 
so that U = B\ + B-2- By Minkowski's Inequality, 

E[\U\ P ] < \iE\B 1 \ p ) 1/p + iE\B 2 \ p ) 1/p 

Since u\ < n, using /i(a;) = (a; + /i) p in (|T|> . and since by assumption w), ri has all finite moments up to 
order p under P° , we have 

sup S| J Bi| p <cx) . 
t>M>l 

From Baccelli and Bremaud (2003, Equation 1.2.25, page 20) that for any measurable function h, 

E[h{T2,...,T n )]=XE°[TlhiT2,...,T n )] . 

This, together with the Cauchy-Schwarz inequality, yields 

E\B 2 \p = \E°i Tl \B 2 \P) < A[£°(ti 2 )] 1/2 [S°|B 2 | 2p ] 1/2 , 

where A = 1/E°(ti). By assumption iv), sup t>1 s>1 E°\B2\ P < oo. for all positive p when d G (0, 4) 
and for p = 8 + S, 5 > when d = 0. It follows that sup t>1 s>1 £'[|[/| p ] < oo. By a similar argument, 
snp t>hs>1 E[\W\P} <oo. □ 
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Lemma 3 For any M > 2 and < d < ~ } 

n n 

E ' • • E { II - l \ } = 0(n 2d(M -V +1 ) (14) 

ki=l k M = l (fei,fcj)eO(G) 

% . ' " . ' 

M-fold (M-i) terms 

where 57(G) is the set of edges of G, G is any connected M-vertex graph with vertices {fci, . . . , /cm} and 
(M — 1) edges; r\ k ._ kj \ = covQiki , h k . ), 1 < i < M,l < j < M, {hki} is a long memory process with 
memory parameter d. 

Proof: Since G is a connected graph with M vertices and (M— 1) edges, it can be represented as a tree 
(see Andrasfai 1977, Chapter 2). The tree representation is not unique. Fix a particular representation. 
Then there is one vertex with no parent, called the root. A vertex with both a parent and a child is 
called a node. A vertex with no child is called a leaf. 

We proceed iteratively. First, select any leaf vertex. By definition of a leaf, the corresponding index 
only appears once in the product, so the sum on this index can be evaluated for this term only, holding 
the other terms fixed. Since r s ~ Cs 2d ^ 1 as s — > oo, we have for any fixed integer i with 1 < i < n, 
ESU K"-' - 1| = 0(n 2d ). 

It follows that the sum on the first index is 0(n 2d ). Next, delete the leaf just used from the tree. The 
resulting graph is again a tree. Repeat the process of selecting a leaf, performing the corresponding sum 
and deleting the leaf until only the root remains. The M- fold sum in (|14fl is now bounded by a constant 
times the sum of n terms each of which is 0(n 2d ( M ~ 1 - ) ). Thus, the sum in 114JI is 0(n 2£i ( M-1 ) +1 ). □ 

Lemma 4 Under the LMSD model described in Theorem\^with memory parameter d s [0, |), P° is {re- 
mixing; The durations {t^} generated by the ACD(1,1) model described in Theorem\Qare exponential a- 
mixing. 

Proof: Under P°, {h^} is a stationary Gaussian process with a log spectral density having an integral 
on [— 7r,7r] that is greater than — oo, so that the innovation variance is positive. Since Gaussian processes 
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are time reversible, it follows that we can represent = a j w k+j where ^ aj < oo and {w^} is an 

iid Gaussian sequence. Arguing as in the proof of Theorem 17.3.1 of Ibragimov and Linnik (1971), pp. 
311-312, replacing {. . . Wk-i, Wk} by {wk, Wk+i, ■ ■ ■ }, it follows that P° is {/ifc}-mixing. Since the {ek} 
are iid it follows that P° is also {efc}-mixing. Since for any process P° is {£fc}-mixing if and only 

if the future tail c-field of is trivial (see, e.g., Nieuwenhuis (1989), Equation (3.3)), it follows from 
Lemma[S]that P° is {rfe}-mixing, where = e h t}-. 

For the ACD(1,1) model, by Proposition 17 of Carrasco and Chen (2002), {r^} is exponential /3-mixing 
(or also called absolutely regular) if {to,i/>o} are initialized from the stationary distribution. Their result 
still holds for a doubly infinite sequence {rk},k G (— oo, oo). It is well known that /3-mixing implies a- 
mixing (or strong mixing), (see Bradley (2005), Section 2.1). Therefore, {ifc} is also exponential a-mixing, 
which further implies {rfc}-mixing of P° for the ACD(1,1) model, see Nieuwenhuis (1989), Equation (3.5). 
□ 

Lemma 5 Let {£ s } and {Cs} be two independent processes whose future tail a -fields are trivial. Then 
the future tail a-field of the process {£ s , £ s } is trivial. 

Proof: Define St — a(^ s ,s > t), % — o-(( s ,s > t) and tit — c(^ s ,C s ,s > t). As pointed out by 
Ibragimov and Linnik (1971, p. 303) (for regularity), to prove that is trivial, it suffices to prove that 
for all Wo -measurable zero mean random variables r\ such that E[?7 2 ] < 1, E[r/ | U t ] converges to in 
quadratic mean. By standard arguments, it suffices to prove this for a random variable r] that can be 
expressed as rj = 77x772 with 77! sSo-m^su^ble and 772 ^-measurable and, without loss of generality, both 
with zero mean. Then, by independence of {£ s } and {Cs}, 

E[t7 I U t ] = E[t7! I S t ] x Efe I %] . 

Since iSoo and are trivial, both terms in the right hand side above tend to in q.m. By independence, 
their product also tends to in q.m. □ 
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